99 research outputs found
It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
When scaled to hundreds of billions of parameters, pretrained language models such as GPT-3 (Brown et al., 2020) achieve remarkable few-shot performance. However, enormous amounts of compute are required for training and applying such big models, resulting in a large carbon footprint and making it difficult for researchers and practitioners to use them. We show that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller. This is achieved by converting textual inputs into cloze questions that contain a task description, combined with gradient-based optimization; exploiting unlabeled data gives further improvements. We identify key factors required for successful natural language understanding with small language models
A Nested Attention Neural Hybrid Model for Grammatical Error Correction
Grammatical error correction (GEC) systems strive to correct both global
errors in word order and usage, and local errors in spelling and inflection.
Further developing upon recent work on neural machine translation, we propose a
new hybrid neural model with nested attention layers for GEC. Experiments show
that the new model can effectively correct errors of both types by
incorporating word and character-level information,and that the model
significantly outperforms previous neural models for GEC as measured on the
standard CoNLL-14 benchmark dataset. Further analysis also shows that the
superiority of the proposed model can be largely attributed to the use of the
nested attention mechanism, which has proven particularly effective in
correcting local errors that involve small edits in orthography
Generating recommendations for entity-oriented exploratory search
We introduce the task of recommendation set generation for entity-oriented
exploratory search. Given an input search query which is open-ended or
under-specified, the task is to present the user with an easily-understandable
collection of query recommendations, with the goal of facilitating domain
exploration or clarifying user intent. Traditional query recommendation systems
select recommendations by identifying salient keywords in retrieved documents,
or by querying an existing taxonomy or knowledge base for related concepts. In
this work, we build a text-to-text model capable of generating a collection of
recommendations directly, using the language model as a "soft" knowledge base
capable of proposing new concepts not found in an existing taxonomy or set of
retrieved documents. We train the model to generate recommendation sets which
optimize a cost function designed to encourage comprehensiveness,
interestingness, and non-redundancy. In thorough evaluations performed by crowd
workers, we confirm the generalizability of our approach and the high quality
of the generated recommendations
- …